28 research outputs found

    Efficient Data Representation by Selecting Prototypes with Importance Weights

    Full text link
    Prototypical examples that best summarizes and compactly represents an underlying complex data distribution communicate meaningful insights to humans in domains where simple explanations are hard to extract. In this paper we present algorithms with strong theoretical guarantees to mine these data sets and select prototypes a.k.a. representatives that optimally describes them. Our work notably generalizes the recent work by Kim et al. (2016) where in addition to selecting prototypes, we also associate non-negative weights which are indicative of their importance. This extension provides a single coherent framework under which both prototypes and criticisms (i.e. outliers) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the key open questions laid out in Kim et al. (2016). By establishing that our objective function enjoys a key property of that of weak submodularity, we present a fast ProtoDash algorithm and also derive approximation guarantees for the same. We demonstrate the efficacy of our method on diverse domains such as retail, digit recognition (MNIST) and on publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health, thus showcasing the power of our technique in providing actionability (for retail), utility (for MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an effective data mining method.Comment: Accepted for publication in International Conference on Data Mining (ICDM) 201

    Signal Recovery in Perturbed Fourier Compressed Sensing

    Full text link
    In many applications in compressed sensing, the measurement matrix is a Fourier matrix, i.e., it measures the Fourier transform of the underlying signal at some specified `base' frequencies {ui}i=1M\{u_i\}_{i=1}^M, where MM is the number of measurements. However due to system calibration errors, the system may measure the Fourier transform at frequencies {ui+δi}i=1M\{u_i + \delta_i\}_{i=1}^M that are different from the base frequencies and where {δi}i=1M\{\delta_i\}_{i=1}^M are unknown. Ignoring perturbations of this nature can lead to major errors in signal recovery. In this paper, we present a simple but effective alternating minimization algorithm to recover the perturbations in the frequencies \emph{in situ} with the signal, which we assume is sparse or compressible in some known basis. In many cases, the perturbations {δi}i=1M\{\delta_i\}_{i=1}^M can be expressed in terms of a small number of unique parameters PMP \ll M. We demonstrate that in such cases, the method leads to excellent quality results that are several times better than baseline algorithms (which are based on existing off-grid methods in the recent literature on direction of arrival (DOA) estimation, modified to suit the computational problem in this paper). Our results are also robust to noise in the measurement values. We also provide theoretical results for (1) the convergence of our algorithm, and (2) the uniqueness of its solution under some restrictions.Comment: New theortical results about uniqueness and convergence now included. More challenging experiments now include
    corecore